-
-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove TimeDelta serialization_type
parameter; deserialize from float; improve serialization perf and accuracy
#2654
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR ❤️ This seems like the right behavior.
Lgtm but I'll give @lafrech some time to take a look in case I'm not thinking of some case where end users might rely on the current behavior.
[I enabled issues again. The spam streak is over on flask-smorest. Let's hope it is here too.] I don't have a full understanding of the issue but in case it went unnoticed, here are related issues/PRs:
Also, the docstring should be updated if we merge. |
Hi @lafrech 👋 You're right, the docstring needed some work. Please see the last commit. I hope that also clarifies the bug that this PR fixes: needlessly using |
ea0509a
to
0755fe1
Compare
Docs update looks good to me. Thanks @ddelange . Thanks for digging up the original issue and PRs @lafrech. Looking back at those, I think allowing truncation to an integer was an oversight on my part rather than an explicit decision. Even the ISO8601 spec mentioned in #105 (comment) allows for fractional components:
@lafrech @deckar01 Do we consider this a breaking change? Even though this is a much smaller change than previous major version bumps, I'm not opposed to a major version bump with some explanatory text in the changelog that says that the change is very limited in scope and only affects users of the |
fwiw, this change only affects (helps) users that were serializing using I would say this does not warrant a major version bump, but rather a bugfix release. |
It would also affect users who didn't explicitly pass |
right. the good ol' rely on a bug motto. programming sucks... |
lol, yeah... i guess i'm not 100% sure it's a bug, given that there was an explicit test for the current behavior. i'm like 80% sure it was oversight, but it was long ago so want to give a bit of time for others to chime in before shipping the change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure.
One of our concerns when designing a field is how values round-trip.
Current behaviour lets the user decide whether the serialized version should be int
or float
, and considers it is being given serialized values accordingly. If values are serialized as int
, then the type that is expected as input when deserializing is int
.
The original name of the parameter was serde_type
, not very nice but showing it works both ways.
Before this PR:
# TimeDelta(serialization_type=int)
12 -> timedelta(seconds=12) -> 12
12.42 -> timedelta(seconds=12) -> 12
# TimeDelta(serialization_type=float)
12 -> timedelta(seconds=12) -> 12.0
12.42 -> timedelta(seconds=12, microseconds=420000) -> 12.42
After this PR:
# TimeDelta(serialization_type=int)
12 -> timedelta(seconds=12) -> 12
12.42 -> timedelta(seconds=12, microseconds=420000) -> 12
# TimeDelta(serialization_type=float)
12 -> timedelta(seconds=12) -> 12.0
12.42 -> timedelta(seconds=12, microseconds=420000) -> 12.42
From this perspective, I'm not sure the latter is less surprising behaviour.
The PR makes the field a bit more permissive on inputs, accepting floats when expecting integers. This is not in line with Integer
field which truncates floats.
If floats are to be expected, then serialization_type=float
should be used.
If the issue is that TimeDelta
defaults to int
then maybe the default could be changed in next major version. Until then, nothing prevents the user to specify float
, right?
I might be totally mistaken, in which case please ignore.
src/marshmallow/fields.py
Outdated
Allow (de)serialization to `float` through use of a new `serialization_type` parameter. | ||
`int` is the default to retain previous behaviour. | ||
Allow serialization to `float` through use of a new `serialization_type` parameter. | ||
Defaults to `int` for backwards compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a new versionchanged line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never truncate a `float` to `int` during deserialization.
too concise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe
Never truncate a `float` to an `int` before deserializing it into a :class:`datetime.timedelta`.
Thanks for your elaborate review!
To me the latter definitely looks 'right'. From another perspective: that float came from marshmallow (starting off with a timedelta). It got serialized and sent elsewhere for deserialization. When the receiving party deserializes, in the latter example he doesn't need prior knowledge about the sender's settings. Deserialization has now become robust. |
Is there anything wrong with I mean does the change in this PR allow a use case that can't be achieved with current parameters? Or is the only benefit to make it the default choice? |
re: anything wrong - my argument here is that serialization_type should be used only for serialization and not for deserialization. but that's about it :) |
So would it be fine if the parameter was named If it is just that, we could use a better naming and deprecate the old one. |
from my side there's no need for a breaking change here, this change just seemed like a nice robustness improvement to me (not considering this breaking). it only came up because a user was surprised that his float was being silently truncated when feeding into @sloria's environs package. since that package only ever deserializes (values from unknown sources), this seemed like the right way forward. but feel free to close / take over if you do consider this a 'bad' change, no hard feelings :) will look uglier than now, but environs could fix this without any PR om marshmallow by subclassing |
I generally agree with @ddelange's comment in #2654 (comment), and the proposed change certainly makes sense for the use case in environs. On one hand, we might as well be robust on input. |
Another possible path would be to
something like diff --git a/src/marshmallow/fields.py b/src/marshmallow/fields.py
index b564aee..19d82f6 100644
--- a/src/marshmallow/fields.py
+++ b/src/marshmallow/fields.py
@@ -1487,6 +1487,8 @@ def __init__(
self,
precision: str = SECONDS,
serialization_type: type[int | float] = int,
+ *,
+ deserialization_type: type[int | float] = int,
**kwargs,
):
precision = precision.lower()
@@ -1508,9 +1510,12 @@ def __init__(
if serialization_type not in (int, float):
raise ValueError("The serialization type must be one of int or float")
+ if deserialization_type not in (int, float):
+ raise ValueError("The deserialization type must be one of int or float")
self.precision = precision
self.serialization_type = serialization_type
+ self.deserialization_type = deserialization_type
super().__init__(**kwargs)
def _serialize(self, value, attr, obj, **kwargs):
@@ -1527,8 +1532,15 @@ def _serialize(self, value, attr, obj, **kwargs):
return value.total_seconds() / base_unit.total_seconds()
def _deserialize(self, value, attr, data, **kwargs):
+ if isinstance(value, float) and self.deserialization_type is not float:
+ warnings.warn(
+ f"float value for attribute {attr!r} will truncated to an integer value in {self.precision}. "
+ + "It will be deserialized as a float in marshmallow 3.25.0.",
+ UserWarning,
+ stacklevel=2,
+ )
try:
- value = float(value)
+ value = self.deserialization_type(value)
except (TypeError, ValueError) as error:
raise self.make_error("invalid") from error
this warns potentially-affected users of the change. but maybe it's not worth the added API surface? ¯\(ツ)/¯ |
We might consider deprecating the type arg while we are at it. The truncation can be enveloped or a custom field and does not seem to warrant a convenience API. |
do you suggest to deprecate |
It seems to me that the current code allows everything using the right
parameters and it's just the default behaviour that is being discussed,
so there's no need to rush fixing/improving this in 3.x by adding
another parameter or a deprecation notice.
Simplifying things and removing parameters can be done in a 4.x change.
We could pick the low-hanging fruits from the 4.0 milestone and ship
soonish, postponing the other issues to 5.0.
…--
Jérôme
|
@lafrech if i'm understanding your suggestion correctly:
in environs, if this is the proposal, i'm good with it |
Yes, scratch this and keep things as is in 3.x, then drop
`serialization_type` and only support the `float` case in 4.0 as
@deckar01 suggests.
And rather than postpone 4.0 endlessly, select PRs from the 4.0
milestone that can be finalized easily.
|
fd171e8
to
210339f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! now targeting 4.0: I've demolished serialization_type
(with deprecation warning), updated tests and added to docstring:
.. versionchanged:: 4.0.0
Deprecate `serialization_type` parameter, always serialize to float.
serialization_type
parameter
0742e3f
to
8d166b1
Compare
8d166b1
to
3470680
Compare
if precision not in self._unit_to_microseconds_mapping: | ||
units = ", ".join(self._unit_to_microseconds_mapping) | ||
msg = f"The precision must be one of: {units}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unified this msg formatting with the Enum field ref.
# limit float arithmetics to a single division to minimize precision loss | ||
microseconds: int = utils.timedelta_to_microseconds(value) | ||
microseconds_per_unit: int = self._unit_to_microseconds_mapping[self.precision] | ||
return microseconds / microseconds_per_unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from float precision issues with the old total_seconds
approach, the new approach is 5x faster:
In [1]: from marshmallow import utils
...: from datetime import timedelta
...:
...: WEEKS = "weeks"
...: DAYS = "days"
...: HOURS = "hours"
...: MINUTES = "minutes"
...: SECONDS = "seconds"
...: MILLISECONDS = "milliseconds"
...: MICROSECONDS = "microseconds"
...:
...: _unit_to_microseconds_mapping = {
...: WEEKS: 1000000 * 60 * 60 * 24 * 7,
...: DAYS: 1000000 * 60 * 60 * 24,
...: HOURS: 1000000 * 60 * 60,
...: MINUTES: 1000000 * 60,
...: SECONDS: 1000000,
...: MILLISECONDS: 1000,
...: MICROSECONDS: 1,
...: }
...:
...: precision = WEEKS
...:
...: def serialize_old(value):
...: base_unit = timedelta(**{precision: 1})
...: return value.total_seconds() / base_unit.total_seconds()
...:
...: def serialize_new(value):
...: microseconds: int = utils.timedelta_to_microseconds(value)
...: microseconds_per_unit: int = _unit_to_microseconds_mapping[precision]
...: return microseconds / microseconds_per_unit
...:
...: value = timedelta(weeks=1, microseconds=1)
In [2]: %timeit serialize_old(value)
558 ns ± 3.16 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit serialize_new(value)
113 ns ± 0.583 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
serialization_type
parameterserialization_type
parameter
serialization_type
parameterserialization_type
parameter; deserialize from float; improve serialization perf and accuracy
This PR allows propagating floats/strings with decimals to TimeDelta.deserialize without silently downcasting the input to int. Currently,
serialization_type
is not only being used for serialization, but also being used for deserialization, which adds no added value: a float cast can always safely be used for deserialization and is backwards compatible (but fixes this bug).originally reported here: sloria/environs#372
MRE:
After this PR:
I added one test for string input, and changed the existing test for float input.
There are already
math.isclose
tests when passing floats to microseconds (which will be rounded by the datetime.timedelta constructor).